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courses by using artificial neural networks. It was also aimed to show the 
independent variables importance in the prediction. In the data set used in this 
study, variables of gender, type of education, field of study in high school and 
transcript information of 14 courses including end-of-term letter grades were 
collected. The fact that the artificial neural network performance in this study 
was R=0.84 for the Science and Technology Education I course, and R=0.84 Elementary Education, 
for the Science and Technology Education II course shows that the network Science and Technology 
performance overlaps with the findings obtained from the related studies. Teaching, Data Mining, 
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1. INTRODUCTION 


Although the key study on artificial neural networks (ANN) was based on the models of 
McCulloch and Pitts (1943), which they called “A logical calculus of the ideas” it is possible 
to say that the use of artificial neural networks constructed by means of computers dates back 
to the 1950s (Heaton, 2008). Historically, basic network architectures were first announced by 
Frank Rosenblatt as “The Perceptron.” Then, chronologically, the progress has been as follows 
(Graupe, 2007): 


e “The Artron” by R. Lee in 1950, 
e “The Adaline” by B. Widrow in 1960, and 
e “The Madaline” by, again, B. Widrow in 1988. 


The ANN models developed in the subsequent years are based on the working principles 
of these four models in general. The use of this method of analysis that can be regarded as new 
is rapidly increasing with many artificial neural network architectures being developed. Today, 
it is possible to find use of artificial neural networks in many different fields including brain 
and cognition (Gupta, Molfese, & Tammana, 1995), scientific and technical information 
(Polanco, Francois, & Keim, 1998), environmental planning, design and architecture (Raju, 
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Sikdar, & Dhingra, 1996; Wyatt, 1996), geographic information systems (Foody, 1995), 
grammer (Vokey & Higham, 2004), business and economics (Selim, 2009; Suzuki, 2001; Tang, 
2009), industrial engineering (Azadeh, Saberi, & Anvari, 2011), energy (Geem, 2010), 
ergonomics (Nussbaum & Chaffin, 1996), ethology (Snyder, 1998), weather forecast (Ghiassi, 
Saidane, & Zimbra, 2005), air pollution (Cai, Yin, & Xie, 2009), human behavior and computer 
(Stevens, Ikeda, Casillas, Palacio-Cayetano, & Clyman, 1999), job security (de Haen, 2009), 
paleontology (Anemone, Emerson, & Conroy, 2011), psychiatry (Cohen, 1994), psychology 
(Verhagen & Scott, 2004), psycho-sociological (Dowman & Ben-Avraham, 2008), health 
(Alam & Briggs, 2011), telematics and informatics (Sim, Tan, Wong, Ooi, & Hew, 2014), 
prevent and analyze traffic accidents (Wei & Lee, 2007), tourism management (Palmer, 
Montano, & Sese, 2006), expert systems (Ahn, Cho, & Kim, 2000) and education and training 
(Demir, 2015). 


When the successful applications in these fields are examined, it can be said that artificial 
neural networks are used especially when there are non-linear, multidimensional, incomplete, 
imperfect and error-prone data, and where there is no mathematical model for solving a problem 
(Cirak & Cokluk, 2013). In order to better understand this method, which is trying to imitate 
the functioning of the human brain, its structure and its basic components need to be examined. 


The structure of this method is based on the functions of the human brain. The cells in the 
brain provide humans with the ability to use and practice their thinking, reasoning, and 
experience (Gonzalez & DesJardins, 2002). Artificial neural networks aim to make use of these 
abilities of the human brain to automatically generate new information through the features of 
learning, discovery and construction without any help (Cirak, 2012; Yavuz & Deveci, 2012). It 
is important to know how a neural network works biologically to be able to better understand 
the working principle of artificial neural networks and the elements of a network. 


The biological nervous system in the human body consists of a three-layered structure 
that includes receiving data, interpreting them, and making decisions (Kuyucu, 2012). The brain 
is in the center of this system shown as the “Nervous System Block Diagram” in Figure |. The 
brain receives information, makes sense of it and makes a proper decision. Arrows from left to 
right convey the information-bearing signals into the system through feedforward, and the 
arrows from right to left through feedback (Haykin, 2009). Receptors transmit stimuli from the 
human body and environment to the neural network by turning them into electrical impulses; 
the effectors turn these impulses into understandable reactions as an output of the system 
(Haykin, 2009). 


Figure 1. Nervous system block diagram (Haykin, 2009). 


The basis of the biological neural networks is the nerve cells. The number of nerve cells 
in the cortex of the human brain is estimated to be about 10 billion (Cuhadar, 2006). A nerve 
cell is composed of a cell body, a dendrite and an axon. At the macro level, in nerve cells, which 
work in a way similar to the working principle of the nervous system, incoming stimuli are 
transmitted to the cell body via dendrites. Outputs generated after operations in the cell body 
are transmitted to other nerve cells via axons. 
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The similarity of the working principle of the artificial neural networks and the elements 
of the network with the biological neural cells is shown in Figure 2. The connections between 
the cells correspond to axons and dendrites. The weight factors (Wx) correspond to synapses. In 
an artificial nerve cell, stimuli coming to the cell (X7, X2, ... Xm) based on the effect of the weight 
factor (Wki, Wk2, ... Wkm) are converted into stimuli as outputs in response to a nonlinear 


activation function by taking into account the state or grade of intracellular synaptic weights 
(Kog, Balas, & Arslan, 2004). 
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Figure 2. Similarity between biological and artificial neural networks (Arbib, 2003a; Haykin, 2009b). 


The basic components of an artificial neural network consist of six different components: 
input layer, weights, hidden layer, summing function, activation function and output layer. 
Biologically, a neuron corresponds to a processor element in artificial neural networks. 
Dendrites are used as inputs, and the cell body is used as a substitute for transfer functions used 
in the network architecture. Weights used in artificial neural networks fulfill the function of 
synapses, while the axons represent the neuronal output of the artificial neural network. The 
basic components of the artificial neural network can be explained as follows. 


Input Layer: Input signals (X7, X2, ... Xm) sent to the input layer are transmitted to the next 
layer without any statistical processing. The only function of this layer is to transmit the data to 
the next layer (Yurdakul, 2014). In addition, the input signals on this layer can be more than 
one. The input signal sent to the input layer may be any of the texture, mathematical value, 
audio signal, or image processing elements according to the type of the network. 


Synaptic Weights: Synaptic weights (Wk7, Wk2, ... Wkm) are the statistical coefficients 
indicating the importance of the input data to the hidden layer and its effect on the learning of 
the network. A positive or negative weight is generated for each input signal (Kuyucu, 2012). 
All links that provide the relationship between input signals and other layers have different 
weight values. This variant assignment of values ensures that synaptic weights are effective on 
all processing elements (Yurdakul, 2014). In artificial neural networks, the display of 
information is provided by these weights. Therefore, synaptic weights are an important variable 
affecting the design and performance of a network (Emir, 2013). In the determination of this 
variable, it is assumed that the input signals have some statistical distributions. 
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Hidden Layer: Intermediate layers providing information exchange between the input and 
output layers are called hidden layers (Sengtir & Tekin, 2013). More than one hidden layer can 
be found in an artificial neural network. However, depending on the problem situation, if there 
are not enough hidden layers, the network fails to learn. If there is more hidden layer than 
necessary, it will cause the network to memorize the current situation and make the learning 
situation impossible (Yurdakul, 2014). 


Summing Junction: This junction, also called merging, calculates the net input to the cell 
(Adiyaman, 2007; Yavuz & Deveci, 2012). It makes this calculation by using the linear 
combination of the weights of the respective input values for each input value (Emir, 2013). 
The addition functions can change in different models depending on the structure of the network 
architecture. In some models, input values are important, while, in some other models, the 
number of inputs may be an important variable. Taking this difference into account, the most 
appropriate addition function is determined by trial and error (Cuhadar, 2006; Kuyucu, 2012). 


Activation Function: The activation function, also called the transfer function, determines 
the output that the cell produces in response to the input by processing the net input obtained 
from the addition function (Giilce, 2010; Kayik¢1, 2014). Separate activation functions are 
applied to all of the nerve cells in the network, and the output value can be calculated by means 
of the statistical value obtained after the function (Cirak, 2012). As with the addition function, 
one can benefit from different functions in the use of this function. 


Output Layer: In this layer, the information to which an appropriate activation function 
is applied is processed to produce the output required for the input data given in the first stage 
to the network. The output layer consists of a single layer, where the generated data are 
transmitted to the outside world. 


Although the components forming an artificial neural network are mainly composed of 
these components, artificial neural networks vary in many different classifications according to 
their intended use. In this study, it was aimed to predict elementary education teacher 
candidates’ achievements in “Science and Technology Education I and II” courses by using 
artificial neural networks. It was also aimed to show the importance levels of the independent 
variables in the prediction. The features and the construction of the network architecture created 
in this framework are explained in the Method section. 


2. METHOD 
2.1. Sample 


The data in this study were obtained from the transcripts of elementary education teacher 
candidates graduating from four different state universities and the demographic information 
without personal information found in student information systems. Graduates who have not 
graduated withinfour years were not accepted into the sample. Moreover, the data of students 
who attended and left the current programs with a lateral or vertical transfer via some programs 
implemented by the Council of Higher Education of Turkey (such as Mevlana or Erasmus 
Student Mobility) were removed and not included in the study. After these procedures, the data 
of a total of 865 graduates were analyzed within the scope of the study. 


2.2. Data Analysis 


The data in this study were obtained from the transcripts of elementary education teacher 
candidates graduating from four different state universities and the demographic information 
without personal information found in student information systems. 


In the data set used in this study, variables of gender, type of education, field of study in 
high school and transcript information of 14 courses including end-of-term letter grades were 
collected and coded as categorical data. In the data analysis stage, these data were converted 
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into numerical data using the methods described below. Categorical variables such as gender 
(male, female), type of education (daytime education, evening education) and field of study in 
high school (Turkish, Mathematics, Science, other) were converted into numerical data with 1- 
of-N encoding method such as 'male' = [1 0] and 'female' = [0 1]. In addition, the placement 
score given by the Assessment Selection and Placement Center of Turkey (ASPC), which is 
another variable, was also included in the data set the same way as it was used by the ASPC 
during placement. 


When the end-of-term letter grades containing the transcript information of the 14 courses 
that constitute the second part of the data set were examined, it was seen that these letter grades 
represented numerically different grade ranges among the universities. For this reason, it was 
found which score ranges the letter grades represented at each university by checking the 
regulations for education and examinations valid for years in which each sample graduated. 
Then, each university was separately transformed into the same standards at intervals given in 
the table of grades of the Higher Education Council, by using the Higher Education Council 
Grade Transformation Table. 


Once the data were converted, the dataset was divided into two parts to construct the 
network and test the problem situations that constituted the purpose of the study. The main 
purpose of this separation was to (1) determine the best network performance and then to (2) 
evaluate the performance of the prediction data that the constructed artificial neural network 
would generate in the presence of the new data set that it would encounter for the first time. In 
other words, this separation would allow the comparison of two things: (1) the output data that 
the artificial neural network would produce when it takes the independent variables as input 
data, assuming that the artificial neural network has learned the relationship between dependent 
and independent variables, and (2) the actual data of the dependent variable in the data set of 
the study. 


In this context, the graphical representation of all operations on the dataset is presented 
below and each step is explained in detail (Fig 3). 


; 


Int. J. Asst. Tools in Educ., Vol. 5, No. 3, (2018) pp. 491-509 


@ 


(A) All Data (N=865) 


@ 


(B) the Data Set to be Simulated 


@ 


) the Data Set on which the ANN was built 


=: 


; 
peg preg 


(D) Learning Data Set (E) Testing Data Set 


V 


ax 


WA x 
o x 


(G) Predicted Data Set (F) Network Construction 


Figure 3. Data set operations for network construction. 


Within the scope of this study, (A) all data of N=865 people who constituted the sample 
were randomly divided into two parts as (B) “the Data Set to be Predicted” and (C) “the Data 
Set on which the ANN was built” to construct a network using the Matlab Neural Network 
Toolbox. 


(B) “The Data Set to be Simulated” consisted of 200 individuals randomly selected from 
the 865 individuals constituting the sample. In general, at least 15% of all datasets are used as 
simulation data in studies (e.g., Bahadir 2013; Bagman 2014; Demir 2015), although there is no 
method or standard for determining this number in the literature. In the context of this study, 
23% (N=200) of the entire data set was reserved for comparing the simulation data with the 
actual data after constructing network. The data set was not seen by the system during the time 
the network learned the relation between its dependent and independent variables. In other 
words, the constructed artificial neural network was formed by the data of the remaining 665 
people. 


(C) “The Data Set on which the ANN was built” consisted of 665 people, except for the 
data of 200 people that were excluded to be simulated later on. This data set included the 
following data, which were thought to be predictive of the achievement in the Science and 
Technology Education course: 665 participants’ gender, field of study in high school, ASPC 
placement score, and year of graduation as well as quantified data of the letter grades of General 
Biology, Introduction to Educational Science, General Chemistry, Educational Psychology, 
General Physics, Science and Technology Laboratory Applications I, Instructional Principles 
and Methods, Science and Technology Laboratory Applications I, and Environmental 
Education courses. When the network learns within this data set, it tries to learn the 
relationships between dependent and independent variables by using the learning functions 
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determined by the researcher, by determining random selections from the data in itself. It, then, 
produces simulation data by testing the performance of the network it has constructed. The 
learning and testing stages are described in detail below. 


(D) In the “Learning Data Set”, the system tests the user-specified learning functions 
during the network learning by using 15% of the participants with random selections from the 
data set given to it. This 15% rate can be increased or decreased by the researcher. Within the 
scope of this study, 15% was used as the default setting for network learning. At this stage, the 
network tries to discover the dependent and independent variable relations in the dataset with 
the determined 15% parts to construct the expected output values. 


(E) A “Testing Data Set” is constructed to test the relations systematic— established in 
the network learning stage, which is the preceding stage — in sets of 15% of data. As in the 
learning data set, the network does this randomly in order to test the performance of the 
network. 


(F) In the “Network Construction” stage, the network — having learned and tested the 
relationships between dependent and independent variables — predicts the possible outputs of 
the data to be inputted by the user. Here, the input data of 200 people reserved in the first stage 
(B) were fed to the network and the outputs were predicted by the network. 


(G) A statistically significant difference was sought by comparing the output data 
generated in the previous stage to the actual data reserved in stage B. 


To summarize all these steps, the entire dataset was divided into two parts: the network 
construction and the comparison of the actual data to the data to be simulated by the constructed 
network. In this separation process, the data set to be predicted was not included in the network 
in order to prevent the network from memorizing the outputs. The data set on which the network 
was constructed was randomly separated by the system to test the network learning and the 
conditions it has learned, and a network architecture was constructed. It was ensured that the 
constructed network architecture predicted the outputs by loading only the input data of the 
dataset reserved in the first stage. Finally, the data that were modeled and predicted by the 
network were compared with their actual values. 


All of these operations in the dataset were done to create the best network architecture. 
However, the most important thing to know here is that there is more than one network 
architecture in artificial neural networks. Within the scope of this study, the procedures for 
creating the best network architecture considering the properties of the problem state and 
variables is explained in detail in the next section. 


2.3. Construction of the Network Architecture 


There are hundreds of options for creating network architectures used in modeling and 
predictioning with artificial neural networks. This feature, which provides the researcher with 
flexibility in selecting the components to be used in network construction (such as network 
type, learning algorithm, and transfer function), sometimes causes the researcher to make too 
many attempts at finding the best combination of components for a proper network selection. 
In these attempts, the goal is to establish a network structure that learns the desired output values 
when it encounters a new data set, or in other words, learns relationships between the dependent 
and independent variables of the research problem in the best way possible. A network 
architecture that solves the research problem in the best way possible can be described as the 
architecture that best learns the relationship between existing inputs and outputs. In order to 
construct the most suitable network architecture, it is necessary — during the selection of 
components stage — to consider many variables such as the hardware characteristics of the 
computer in which the network is tried, the type of variables used, and the characteristics of the 


497 


Int. J. Asst. Tools in Educ., Vol. 5, No. 3, (2018) pp. 491-509 


desired output data. Given this diversity, many network architectures should be tried to do any 
modeling, and the architecture that gives the best result should be preferred. 


10 Neurons 


30 Neurons 


1 Neuron 


Figure 4. Constructed Network Architecture 


Within the scope of this study, 146 different network architectures were constructed to 
obtain the most suitable network architecture. Each constructed architecture was tested 10 
times. Among these 10 trials, the highest performance verification value was recorded together 
with the R values of the trial, verification and test processes that were created during the 
network learning stage. Therefore, within the scope of this study, when selecting the most 
suitable network architecture for the problem situation, 1460 different network architectures 
were investigated. The network architecture with the lowest Mean-Square Error (MSE) value 
was taken as the criterion for the most suitable network selection. 


The table below shows the characteristics of the network with the best performance value 
after the trials (Table 1). 


Table 1. Characteristics of the Network Architecture (see Fig 4). 


Adaptation Number of ‘ Number of Neurons 
: Performance Transfer : Learning 
Piero t yee Teeune Function Function ne Function 
Function Layers Hidden Hidden 
Layer 1 Layer 2 
Momentum nets 
Cascade-forward Weights Mean-Square yP Levenberg- 
i d Tangent 2 10 30 
Backpropagation Gradient Error : : Marquardt 
Sigmoid 
Descent 


After setting the network type, the stage for the determination of the learning function 
began. At this stage, Levenberg-Marquardt (TRAINLM), the gradient descent (TRAINGD), 
and the Powell-Beale restart conjugate gradient descent learning algorithm (TRAINCGB), 
which yield fast results for nonlinear problems, were tested separately in the architectures. 


3. FINDINGS 


3.1. Modeling and Estimation of Achievement in Science and Technology Education I and 
II Courses using Artificial Neural Networks 


When the network performances of the architectures — prepared to predict 
achievements in the Science and Technology Education I (Network 1) and II (Network 2) 
courses and to ensure learning of the network — were examined, it was seen that the mean- 
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square error values were MSEnetwork 1=0.478 and MSENetwork 2=0.427. The table showing the 
regression coefficients for learning, validation and testing of the artificial neural network is 
given below (Table 2). 


Table 2. Network Performance and Regression Values 


R 
Network MSE 
Learning Verification Test Total 
1 0.47754 0.81632 0.90097 0.87808 0.83774 
2 0.42740 0.81917 0.89856 0.83345 0.83679 


When the above table is examined, it can be seen that the network regression values were 
R=0.82 for the learning stage, R=0.90 for the verification stage and R=0.88 for the test stage in 
Network 1; and R=0.82 for the learning stage, R=0.90 for the verification stage and R=0.83 for 
the test stage in Network 2. The total R value, which is the other regression coefficient given 
above, was obtained by introducing the input values of 665 people to the network after the 
network learning stage, estimating the output values and comparing these values with the actual 
values in a correlation. Therefore, it is seen that there was a statistical correlation between the 
Science and Technology Education I course grades produced by the network and the actual 
grades (R=0.84), and between the Science and Technology Education IH course grades produced 
by the network and the actual grades (R=0.84). These values can be interpreted as a high 
correlation between the data produced by the network and the actual data. Based on this, it can 
be said that the network accomplished a successful modeling. A graphical comparison of the 
actual grades of the people with the grades predicted by the network is given below (Fig 5 & 
Fig 6). 
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Figure 5. Science and Technology Education I Course Observed and Predicted Values 
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Figure 6. Science and Technology Education II Course Observed and Predicted Values 


After the establishment of the network architecture, the achievement grades in the Science 
and Technology Education I and II courses were estimated by using the input values of the 200 
people, which were reserved at the beginning of the study and were not seen by the network 
before, as well as the existing network structure and connections. These grades that were 
predicted were compared with the actual data set, and correlations for Network | (r=.68, 
p<0.001) and for Network 2 (7=.69, p<0.001) were obtained. As it is known, there is no general 
rule about the evaluation of the correlation coefficient, but it is possible to describe the 
relationship between .68 and | as a high or strong relation according to Taylor (1990). This 
finding is an indication that the constructed artificial neural network learned the achievement 
grades in the Science and Technology Education I and II courses, which were the output values, 
using the input values of the people, at a good level. 


3.2. Independent Variables Importance Used to Estimate the Achievement in the Science 
and Technology Education I and II Courses 


The final finding obtained within the scope of this study was to determine the independent 
variables importance used to predict the dependent variables. In this context, the independent 
variables importance for the Science and Technology Education I course are given in Table 3. 
When Table 3 is examined, it is seen that the placement score (100%) was the most important 
variable affecting the dependent variable. The variables with a normalized importance level 
greater than 50% were Introduction to Educational Science (71.80%), Science and Technology 
Laboratory Applications I (68.40%), General Physics (58.30%), Instructional Principles and 
Methods (53.80%), General Chemistry (53.00%) and Educational Psychology (50.70%). 
However, variables with a importance level below 50% were Science and Technology 
Laboratory Applications II (48.20%), Gender (45.90%), General Biology (44.90%), 
Environmental Education (43.40%), Field of Study in High School (40.20%) and Type of 
Education (35.80%). A bar graph showing the normalized independent variables importance 
are given below. 
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Table 3. Normalized Independent Variables Importance for Science and Technology Education I Course 


Independent Variable Importance Normalized Importance* 
Placement Score 14 100.00% 
Introduction to Educational Science 101 71.80% 
seas =~ Technology Laboratory 096 68.40% 
Applications I 

General Physics .082 58.30% 
Instructional Principles and Methods 075 53.80% 
General Chemistry .074 53.00% 
Educational Psychology 071 50.70% 
pouch ‘ay Technology Laboratory 067 48.20% 
Applications II 

Gender .064 45.90% 
General Biology .063 44.90% 
Environmental Education .061 43.40% 
Field of Study in High School .056 40.20% 
Type of Education 05 35.80% 


* Importance values divided by the largest importance values and expressed as percentages. 


As can be seen above, 13 variables, which were defined as the independent variables for 
the Science and Technology Education I course, were ranked in descending order according to 
their normalized importance levels. 

The independent variables importance for the Science and Technology Education II 
course are given in Table 4. 


Table 4. Normalized Independent Variables Importance for Science and Technology Education II 
Course 


Independent Variable Importance Notinalze? 
Importance* 
Placement Score .108 100.00% 
Instructional Principles and Methods wl 92.2% 
General Biology O71 65.8% 
General Physics O71 65.8% 
Classroom Management 07 64.2% 
Science and Technology Laboratory Applications I .066 61.00% 
Introduction to Educational Science .061 56.1% 
Measurement and Assessment .056 51.9% 
General Chemistry 055 51.00% 
Educational Psychology .052 48.2% 
Science and Technology Education I O51 47.3% 
Instructional Technologies and Material Design O51 47.00% 
Science and Technology Laboratory Applications II 044 40.2% 
Environmental Education 041 37.6% 
Type of Education .037 34.5% 
Field of Study in High School .035 32.7% 
Gender .03 27.6% 


* Importance values divided by the largest importance values and expressed as percentages. 


When Table 4 is examined, it is seen that the most important variable affecting the 
Science and Technology Education II course was the Placement Score (100%), which was also 
the most important variable that predicted the Science and Technology Education I course. The 
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variables with a normalized importance level greater than 50% were the achievement grades of 
the Instructional Principles and Methods (92.2%), General Biology (65.8%), General Physics 
(65.8%), Classroom Management (64.2%), Science and Technology Laboratory Applications I 
(61.00%), Introduction to Educational Science (56.1%), Measurement and Assessment (51.9%) 
and General Chemistry (51.00%) courses. They were followed by Educational Psychology 
(48.2%), Science and Technology Education I (47.3%), Instructional Technologies and 
Material Design (47.00%), Science and Technology Laboratory Applications II (40.2%), 
Environmental Education (37.6%), Type of Education (34.5%), Field of Study in High School 
(32.7%) and Gender (27.6%). 


4. DISCUSSION AND CONCLUSION 


When the literature on the use of artificial neural networks in the field of education and 
training is examined, it is found that even though there are several studies conducted in the 
international literature (Ibrahim & Rusli, 2007; Karamouzis & Vrettos, 2008; Oladokun, 
Adebanjo, & Charles-Owaba, 2008), far less attention has been paid on the prediction and 
classification of student achievement in Turkey. However, it is possible to come across frequent 
use of regression analysis types (Acil, 2010; Bahar, 2011; Bastiirk, 2008; Dogan & Sahin, 2009; 
Kablan, 2010; K6ésterelioglu, Késterelioglu, & Kilmen, 2008) in prediction studies related to 
education and training in our country. 


Some of the artificial neural network studies conducted in Turkey have been directed 
towards the prediction of standard tests such as TEOG (Sen, Ugar, & Delen, 2012), KPSS 
(Demir, 2015), and PISA (Tepehan, 2011) as well as students’ course scores (Turhan, Kurt, & 
Engin, 2013) and general academic scores (Sengtir, & Tekin, 2013). In general, two types of 
benchmarking studies on artificial neural networks are widely available on the basis of 
education and training. The first of these is the type of studies by Ayik, Ozdemir and Yavuz 
(2007), Ibrahim and Rusli (2007), Sengiir (2013), Sengiir and Tekin (2013), Tosun (2007), and 
Vandamme, Meskens and Superby (2007), which are based on a comparison of prediction 
performances of decision trees and artificial neural networks, including classification analyzes 
from data mining models. The second is the type of studies by Bahadir (2013), Cirak (2012), 
Guo (2010), Tepehan (2011), and Turhan, Kurt and Engin (2013), which include the 
comparison of prediction performances of regression models with those of artificial neural 
networks. Other than these, there are also studies on the prediction and classification of student 
achievement using artificial neural networks without performance comparisons (Demir, 2015; 
Karamouzis & Vrettos, 2008; Kardan, Sadeghi, Ghidary and Sani, 2013; Naser, Zaqout, Ghosh, 
Atallah, & Alajrami, 2015; Oancea, Dragoescu, & Ciucu, 2013; Oladokun, Adebanjo, & 
Charles-Owaba, 2008). 


However, it is seen that research mainly focuses on studies to compare the performances 
of artificial neural networks, decision trees and regression models. It can be argued that this is 
due to the fact that researchers have been engaged in such studies mainly to measure and 
demonstrate the effectiveness of the statistical methodology. In this study, the main purpose 
was the prediction of Science and Technology Education course achievements of elementary 
education teacher candidates. As a result of the study, it is seen that the most important variable 
in the prediction of Science and Technology Education I and II courses was the university 
placement scores of the candidates. In a study conducted by Sitturug (1997), it is seen that 
scientific process skills are the most effective variable in predicting the science achievement of 
teacher candidates. The main reason for the differences in these two studies is due to the 
difference in the type and number of independent variables employed in the prediction of 
science achievement. In addition, the fact that the number of people in the sample of Sitturug’s 
(1997) study was 80 and that the statistical analysis was carried out through regression analysis 
can be shown as another factor causing this difference between the two studies. Similarly, in a 
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study conducted by Anil (2009) in parallel with the main purpose of this current study, the most 
important variable in predicting students’ science achievement in the PISA test was the father’s 
educational status. Moreover, Ceylan and Berberoglu (2007) concluded that the most important 
variable in explaining the science achievement of the students participating in the TIMSS test 
was the students’ perceptions of failure. In this study conducted by Ceylan and Berberoglu 
(2007), parental education level was not included as an independent variable. In addition, the 
reasons for the above-mentioned studies to differ from the findings of the current study include 
the facts that those studies were carried out on students under the age of 15, they employed 
regression analyses and structural equation modeling, and they were carried out to predict 
standard test scores such as PISA and TIMSS. 


Numerous studies exist in predicting science achievement of teacher candidates as well 
as students, showing that variables such as attitudes towards science and technology, parental 
education level, socio-economic level, and scientific process skills are related to science 
achievement and that they have an important effect in predicting the science achievement of 
students (Berberoglu, Celebi, Ozdemir, Uysal, & Yayan, 2003; Fleming & Malone, 1983; 
Germann, 1994; Schibeci & Riley, 1986).. 


Likewise, in studies where artificial neural networks are used to predict academic 
achievement, variables that are expected to influence the success of the course or general 
academic achievement have been incorporated into models by researchers as independent 
variables by creating different network architectures. This also leads to differentiation of 
research results. The biggest cause of these differences is the fact that different types of input 
variables are chosen by researchers as mentioned above. In addition, the number of layers, 
number of neurons and learning function are increased when creating the best model with 
artificial neural networks. These increase the number of combinations of trials. This excess 
number of combinations allows researchers to create a large number of network architectures 
in their studies. This leads to different analyses and different outcomes in similar studies, as it 
is the case in the prediction of academic achievement. However, it is seen that the lowest 
classification prediction and the highest network performance vary between 51.88% and 
91.77%, respectively, in the findings of studies (such as linear regression, logistic regression 
and decision tree) on the comparison of the predictions and the statistical methods with respect 
to artificial neural networks related to education and training (Bahadir, 2013; Cirak, 2012; 
Gilcin, Cirak, & Cokluk, 2013; Demir, 2015; Guo, 2010; Ibrahim & Rusli, 2007; Karamouzis 
& Vrettos, 2008; Kardan, Sadeghi, Ghidary, & Sani, 2013; Moridis & Economides, 2009; 
Naser, Zaqout, Ghosh, Atallah, & Alajrami, 2015; Oancea, Dragoescu, & Ciucu, 2013; 
Oladokun, Adebanjo, & Charles-Owaba, 2008; Paliwal & Kumar, 2009; Romero, Ventura, & 
Garcia, 2008; Rusli, Ibrahim, & Janor, 2008; Sen, Ugar, & Delen, 2012; Sengtir, 2013; Sengiir 
& Tekin, 2013; Tepehan, 2011; Tosun, 2007; Turhan, Kurt, & Engin, 2013; Vandamme, 
Meskens, & Superby, 2007). 


The fact that the artificial neural network performance in this study was R=0.84 for the 
Science and Technology Education I course, and R=0.84 for the Science and Technology 
Education II course shows that the network performance overlaps with the findings obtained 
from the above studies. In the prediction of academic achievement, both in the context of this 
study and in the studies mentioned above, the results obtained using artificial neural networks 
show that the prediction results can be considered quite good specifically for social sciences. 
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